feat: Adds intelligent tiered model routing#47
Merged
veerareddyvishal144 merged 17 commits intomainfrom Feb 22, 2026
Merged
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix Codex Bash mapping: shell_command → shell (array format for command) - Add missing Codex mappings: TodoWrite → update_plan, WebSearch → web_search - Add two-layer tool filtering for IDE clients: Layer 1: IDE_SAFE_TOOLS removes AskUserQuestion (can't work through proxy) Layer 2: CLIENT_TOOL_MAPPINGS per-client filter ensures each client only sees tools it supports (e.g. Codex gets 8, Claude Code gets 14) - Add tool name mapping to chat/completions response paths (streaming + non-streaming) - Add missing Claude Code tools: MultiEdit, LS, NotebookRead - Inject filtered tools in openai-router.js before orchestrator call to prevent providers from injecting full STANDARD_TOOLS Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Demote 22 info→debug in openai-router.js (request previews, tool injection, streaming chunks, intermediate conversions) - Demote 39 info→debug in databricks.js (tool injection, request construction, response parsing across all providers) - Clean up orchestrator/index.js: consolidate Ollama conversational check (6→1 log), headroom compression (4→1), tool execution mode (4→1); remove 4 console.log artifacts and [CONTEXT_FLOW] scaffolding - Fix tier config: change hard throw to graceful warn when TIER_* env vars missing (was crashing CI) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds optional persistent log file rotation via pino-roll (LOG_FILE_ENABLED=true) and expands the Structured Logging section in production.md with file logging config, log level philosophy, and querying examples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds missing sections to all config files: file logging (LOG_FILE_*), rate limiting, policy, agents, token optimization, smart tool selection, prompt/semantic cache, tiered routing, and provider configs (LM Studio, Z.AI, Vertex AI). Adds /app/logs volume for persistent log rotation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Moonshot AI as first-class provider (invokeMoonshot, config, orchestrator, provider discovery) - Fix stop_reason detection: check tool_calls presence instead of finish_reason string - Fix streaming format mismatch: force non-streaming for OpenAI-format providers - Fix reasoning content handling: use content field, fallback to reasoning_content - Fix orchestrator double-conversion for Moonshot responses - Fix force-local routing to respect TIER_SIMPLE config instead of hardcoding Ollama - Remove dead code: determineProviderSync (unused sync routing fallback) - Update routing docs: clear precedence hierarchy for TIER_* vs MODEL_PROVIDER vs PREFER_OLLAMA - Add comprehensive Moonshot documentation across all doc files - Add Moonshot to model-tiers.json (kimi-k2-thinking for REASONING tier) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ineProviderSync Replace all determineProviderSync() calls in tests with async determineProviderSmart() since the sync function was removed as dead code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…chestrator) - Add missing logger require in src/api/router.js (used in streaming error handling) - Fix clean.model → cleanPayload.model in orchestrator hybrid mode response Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds intelligent tiered model routing, new provider integrations (Moonshot AI), and significant improvements to routing infrastructure, documentation, and DevOps tooling.
New Providers
invokeMoonshot). Includes model mapping, native system role support, tool calling, thinking model support (kimi-k2-thinking), and non-streaming mode.4-Tier Intelligent Routing System
TIER_*env vars (TIER_SIMPLE,TIER_MEDIUM,TIER_COMPLEX,TIER_REASONING) inprovider:modelformat overrideMODEL_PROVIDERfor routingBug Fixes
stop_reasondetection — Check for actualtool_callsarray presence instead offinish_reasonstring. Fixes tool calls not executing with Moonshot (and potentially other providers that returnfinish_reason: "stop"with tool_calls)stream: falsefor OpenAI-format providers (Moonshot, Azure OpenAI) since OpenAI SSE to Anthropic SSE conversion is not implementedcontentfield directly, fall back toreasoning_contentonly when content is empty. Fixes thinking model chain-of-thought leaking into CLI outputTIER_SIMPLEconfig instead of hardcoding Ollama when force-local pattern matchesRouting Precedence (Documented)
TIER_*setMODEL_PROVIDERignored for routing.TIER_*setMODEL_PROVIDERused.TIER_*setMODEL_PROVIDER.PREFER_OLLAMATIER_SIMPLE=ollama:<model>.Code Cleanup
determineProviderSync()— dead code, no call sitesPREFER_OLLAMAwith runtime warning pointing toTIER_*varsNew Routing Modules
src/routing/model-tiers.jsTIER_*env var parsing, model selectionsrc/routing/agentic-detector.jssrc/routing/cost-optimizer.jssrc/routing/model-registry.jssrc/routing/complexity-analyzer.jsDocumentation
routing.md— New comprehensive routing docs with precedence hierarchy, scoring algorithm, agentic detection, cost optimization, decision flowproviders.md— Added Moonshot section (claude code >= 2.1.9 no longer works #10), updated configuration methods with clear TIER_* vs MODEL_PROVIDER explanationtroubleshooting.md— Added Moonshot troubleshooting (rate limits, auth, reasoning content)installation.md— Added Moonshot quick startfaq.md— Updated provider counts, added Moonshot recommendations.env.example— Added Moonshot config, expanded MODEL_PROVIDER comments explaining its role with tier routingDevOps
Config Files
config/model-tiers.json— Tier preferences for all providers including Moonshot (kimi-k2-thinkingfor REASONING).env.example— Full Moonshot section, expanded routing documentation in commentsTest Plan
MODEL_PROVIDER=moonshotand validMOONSHOT_API_KEYstop_reason: "tool_use"set correctly when tool_calls presentkimi-k2-thinking) returns clean output without chain-of-thought